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Abstract  We  model  the  vague-to-crisp  dynamics  of 
forming  percepts  in  the  brain  by  combining  two  method¬ 
ologies:  dynamic  logic  (DL)  and  operant  learning  process. 
Forming  percepts  upon  the  presentation  of  visual  inputs  is 
likened  to  model  selection  based  on  sampled  evidence.  Our 
framework  utilizes  the  DL  in  selecting  the  correct  “per¬ 
cept”  among  competing  ones,  but  uses  an  intrinsic  reward 
mechanism  to  allow  stochastic  online  update  in  lieu  of 
performing  the  optimization  step  of  the  DL  framework.  We 
discuss  the  connection  of  our  framework  with  cognitive 
processing  and  the  intentional  neurodynamic  cycle. 
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Introduction 

Dynamic  logic  learning  (DL)  is  a  computational  theory  of 
cognitive  processes  (Perlovsky  2001),  which  emphasizes  the 
“vague-to-crisp”  aspect  of  perceptual  and  cognitive  pro¬ 
cessing.  DL  postulates  that  the  internal  concepts  (models), 
which  correspond  to  various  perceptual  categories,  are 
learned  by  way  of  gradual  transition  from  fuzzy  and  uncer¬ 
tain  (“vague”)  initial  models  to  sharp  and  definite  (“crisp”) 
models.  Such  vague-to-crisp  transition  mechanism  presum¬ 
ably  provides  computational  efficiency  by  avoiding  the 
computational  complexity  involved  in  associating  percep¬ 
tual  inputs  to  internal  models.  The  DL  mechanism  is  postu¬ 
lated  to  exist  due  to  the  need  of  an  organism  to  satisfy  the 
“knowledge  instinct”  (KI),  which  is  an  intrinsic  motivation 
of  the  organism  and  is  satisfied  when  similarity  between  the 
models  and  the  perceptions  is  maximized.  DL  is  mathe¬ 
matically  formulated  in  a  Bayesian  learning  framework  and 
is  related  to  statistical  parameter  estimation  in  mixture 
models  under  certain  conditions  it  can  be  considered  as  a 
maximum  likelihood  joint  parameter  estimation  and  model 
selection.  DL  has  resulted  in  improvements  in  signal  pro¬ 
cessing  (Perlovsky  2010). 

Operant  learning  (OL)  is  a  theoretical  framework  for 
adaptive  learning  by  an  intelligent  agent  as  it  interacts  with 
its  environment.  It  is  characterized  by  selecting  the  next 
action  based  on  current  action  probabilities  with  sub¬ 
sequent  adjustment  of  the  probabilities  based  on  a  rein¬ 
forcement  signal.  The  probability  of  the  currently  selected 
action  changes  based  on  the  strength  of  the  reinforcement. 
At  the  same  time,  the  probabilities  of  the  alternative  actions 
change  only  due  to  normalization  to  assure  that  the  sum  of 
action  probabilities  equals  to  one.  The  OL  algorithm  is 
different  from  the  Bayesian  learning  where  all  action 
probabilities  are  adjusted  by  computing  the  posterior 
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probability  based  on  current  input.  However,  previous 
work  (Zhang  2009a,  b)  showed  that  these  two  styles  of 
learning  are  closely  related  at  the  level  of  ensembles  of 
learning  agents. 

In  this  work,  extending  our  preliminary  results  (Ilin  et  al. 
2011),  we  show  that  DL  and  OL  dynamics  can  be  merged 
into  a  unified  computational  framework.  Accordingly,  the 
original  DL  batch  learning  algorithm  will  be  modified  (1) 
to  achieve  online  processing,  and  (2)  to  use  OL  style 
updates.  The  association  weights  between  inputs  and 
models  are  adjusted  using  the  OL  procedure,  where  the 
reinforcement  signal  corresponds  to  the  similarity  between 
the  selected  model  and  the  current  input,  while  the  model 
parameters  adjusted  using  an  OL  style  procedure.  We 
demonstrate  the  operation  of  the  new  algorithm  using 
numeric  examples.  This  combined  algorithm  provides  an 
alternative  formulation  of  the  vague-to-crisp  DL  process, 
with  the  added  advantage  of  reducing  computational 
complexity  and  realizing  online  processing. 

Methodology 

Dynamic  logic  framework 

As  mentioned  in  the  introduction,  DL  is  a  computational 
theory  of  perceptual  and  cognitive  processing  in  the  brain. 
It  is  built  on  the  idea  of  internal  models  that  correspond  to 
both  tangible  and  abstract  concepts  in  the  environment. 
The  internal  models  form  a  hetero-hierarchy  modeling  the 
relationships  existing  between  the  concepts.  In  the  absence 
of  sensory  input,  the  models  exist  is  a  “vague”  form.  They 
are  actualized  in  a  “crisp”  form  by  coming  into  contact 
with  the  sensor  data.  The  actualization  of  the  models 
causes  them  to  adjust  to  sensory  input  and  form  the  best 
possible  match.  The  DL  theory  postulates  that  the  brain  is 
found  in  constant  need  to  increase  the  match  between  its 
internal  models  and  the  environment.  Such  need  is  called 
the  knowledge  instinct  (Perlovsky  2001).  Driven  by  the 
knowledge  instinct,  the  brain  is  in  the  perpetual  process  of 
learning  and  improving  its  internal  models. 

Mathematically,  the  internal  world  of  an  agent  is 
described  as  a  set  of  parametric  statistical  models.  Consider 
a  finite  set  of  possible  concepts,  of  size  H.  Denote  the 
internal  model  corresponding  to  a  concept  h  =  hy 

M^.  The  match  (called  “similarity”  below)  between  a 
model  Mh  and  an  individual  sensory  input  is  given  by  the 
joint  probability  density  function  g  (x^,  M/,).  Each  model 
depends  on  a  set  of  parameters  called  “association 
weights”.  Denote  the  set  of  all  sensor  inputs  by  X  =  {v„, 
n  =  1..  .N} .  The  total  similarity  between  all  models  and  all 
inputs  is  given  by  the  data  likelihood 


N  H 

L{X,M{S))  =  l[y2sixn,Mf,).  (1) 

n=l  h=\ 

Here,  N  and  H  are  the  total  number  of  inputs  and 
models,  respectively.  Using  Bayes  theorem,  the 
similarity  can  be  expressed  through  the  a  priori 
probability  ph  of  model  and  the  conditional 

density  of  the  data  given  the  model: 

g  {Xn,Mh)  =  Phg  {Xn \Mh)  (2) 

Note  that  here  “a  priori  probability”  ph  is  used  with  respect 
to  the  gathering  of  sensory  input  at  a  particular 
moment  it  is  the  probability  that  model  is  true  (but 
with  yet-to-be-specified  parameters  Sh)  prior  to  receiving 
among  a  stream  of  sensory  inputs.  Equation  (1)  is 


maximized  by  iteratively  computing  the 
quantities  (Perlovsky  2001): 

following 

fhn  =g{x„,Mh)  {xn,Mh'),  Vn, h 

/  h'=l 

(3) 

rf‘ =/>;+*- V* 

n=l  Ph 

(4) 

n=l 

(5) 

In  Eqs.  (3  5),  1  is  the  iteration  number,  and  e,  is  the 
learning  rate.  The  association  weight  /  between  the  model 
Mh  and  current  input  Xn  is  computed  in  Eq.  (3)  based  on  the 
parameter  estimates  of  the  current  model.  Equation  (4) 
adjusts  the  a  priori  probabilities.  Equation  (5)  adjusts  the 
parameter  estimates  by  weighted  gradient  ascent  using 
the  current  association  weights.  The  vague-to-crisp  process 
is  achieved  by  proper  initialization  of  the  models  and 
optionally  introducing  additional  parameters  controlling 
the  fuzziness  of  the  models.  Note  that  the  association 
weights  are  the  posterior  probabilities  of  models  given 
input  Xn. 

The  models  are  initialized  to  have  comparable  simi¬ 
larity  to  the  incoming  input,  and  therefore,  initially  the 
input  is  associated  with  more  than  one  model.  After  the 
models  are  properly  learned,  the  new  input  is  correctly 
associated  with  the  model  that  results  in  the  maximum 
similarity  or,  equivalently,  having  the  association  weight 
close  to  1.  In  (3  5),  DL  is  described  as  a  batch  algorithm. 
This  formulation  has  been  used  successfully  in  many 
applications,  including  the  detection  and  tracking  of 
targets  (Deming  1998;  Deming  et  al.  2007a,  b;  Perlovsky 
and  Deming  2007;  Ilin  and  Deming  2010).  Appendix  1 
provides  more  information  on  how  the  iterative  proce¬ 
dure  is  derived. 
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Operant  learning  framework 

Operant  learning  (OL)  or  operant  conditioning  is  based  on 
the  process  by  which  an  animal  modifies  its  behavior  as  a 
result  of  experiencing  the  consequences  of  its  past  behav¬ 
ior.  OL  uses  the  assumption  that,  in  a  given  context  the 
action  is  selected  by  the  animal  solely  based  on  the  action 
probabilities  learned  from  past  experience.  Mathemati¬ 
cally,  given  a  repertoire  of  H  actions  with  corresponding 
action  probabilities  the  probabilities  of  selected  action  h 
are  adjusted  as  follows 

Pt' =pI+^0h{\-pI),  (6) 

while  the  probabilities  of  the  other  actions  h'  ^  h  are 
given  by 

Pl^^=pl,-e0,pl.  (7) 

Here  s  is  the  learning  rate  and  6^  is  the  reinforcement 
signal  after  action  h  is  selected.  The  index  n  refers  to  the 
time  step  of  the  algorithm  execution.  These  formulas 
express  the  idea  that  the  probability  of  the  currently 
selected  action  increases  proportionally  to  the  reward 
received  after  the  action  is  selected.  The  probabilities  of  all 
other  actions  decrease  in  order  to  keep  their  sum  equal  one 
(normalization  effect).  The  reward  signal  6h  has  to  have 
upper  bound  in  order  to  maintain  the  values  of  all 
probabilities  between  zero  and  one.  Usually,  the  learning 
rate  s  is  chosen  to  avoid  this  problem.  Even  if  occasionally 
wrong  actions  are  selected  and  possibly  rewarded,  the  right 
action  choice  will  be  learned  after  many  trials  given  that 
the  learning  rate  is  sufficiently  small  (Zhang  2009a). 


Sequential  dynamic  logic  algorithm 


One  of  the  disadvantages  of  DL  algorithm  is  the  require¬ 
ment  to  process  all  available  data  in  a  batch  mode.  It  turns 
out  that  the  algorithm  can  be  modified  for  online  data 
processing. 

The  algorithm  given  by  (3  5)  is  modified  for  processing 
one  input  at  a  time.  This  turns  the  algorithm  into  stochastic 
gradient  ascent.  Such  algorithm  can  be  made  more  efficient 
by  introducing  the  momentum  term  (Haykin  1999),  which 
helps  “remember”  the  derivatives  obtained  from  process¬ 
ing  previous  inputs  and  make  the  gradient  ascent  trajectory 
smoother.  Thus,  the  sequential  DL  algorithm  is  given  as 
follows. 


H 


fhn  =  Phg  {Xn  \Mh)  }  ^ Phg  (X„  ) ,  Vh 


h'=l 


^rh  —  fhn  + 

Ph  —  Ph  “t“  £r  ’  P^rht 


(8) 

(9) 

(10) 


+  V*  (11) 

S^  =  S^-i-s-D,  V/z  (12) 

Equations  (8  12)  are  applied  to  each  input.  Note  that  (8) 
is  the  same  as  (3)  combined  with  (2).  The  gradients  are 
computed  based  on  a  single  input  and  the  discounted  values 
of  the  previous  iteration  gradients.  New  parameters  P  and 
Pr  are  introduced  for  the  momentum  terms. 


Operant  learning  dynamic  logic  (OL  DL)  algorithm 

Conceptually,  the  sequential  DL  procedure  described  in  the 
previous  section  processes  each  input  in  two  steps: 

•  Step  1:  Equations  (8  10):  The  input  is  compared  with 
all  models  using  Bayes  theorem  and  the  association 
weights  between  the  input  and  the  models  are  adjusted. 
These  weights  are  used  to  adjust  the  model  probabilities 
Ph- 

•  Step  2:  Equations  (11  12):  the  parameters  of  the 
models  are  adjusted  using  gradient  ascent,  weighted 
by  the  association  weights.  The  choice  of  new  param¬ 
eters  is  guided  by  the  relative  similarity  between  the 
input  and  the  models. 

In  previous  studies,  (Deming  1998;  Deming  et  al.  2007a, 
b;  Perlovsky  and  Deming  2007;  Ilin  and  Deming  2010)  DL 
has  been  applied  to  scenarios  with  data  inputs  coming  from 
multiple  sources,  such  as  target  and  clutter,  and  the  task 
was  to  simultaneously  estimate  the  parameters  of  all  of  the 
models.  Probabilities  ph  converge  to  the  relative  propor¬ 
tions  of  the  data  from  different  sources.  Consider  a  scenario 
where  all  the  data  come  from  a  single  source.  The  algo¬ 
rithm  will  eventually  identify  the  model  best  corresponding 
to  that  single  source  by  adjusting  its  a  priori  probability  to 
1,  while  the  a  priori  probabilities  of  all  other  models  will 
become  0.  This  is  equivalent  to  solving  a  model  selection 
problem.  In  terms  of  operant  conditioning,  this  is  equiva¬ 
lent  to  learning  the  appropriate  behavior  through  experi¬ 
ence  by  adjusting  internal  parameters. 

Dynamic  Logic  (DL)  is  formulated  as  a  Bayesian  model 
selection  framework.  Based  on  the  above  conceptual 
description,  we  can  transform  DL  into  a  two  stage  OL 
framework,  such  as  described  in  (Zhang  2009b).  The  first 
stage  consists  of  the  selection  of  the  model,  and  the  second 
stage  consists  of  the  selection  of  the  model  parameters.  The 
repertoire  of  actions  in  the  first  stage  consists  of  selecting 
one  of  H  models,  with  model  probabilities  given  by  p^.  The 
repertoire  of  actions  in  the  second  stage  consists  of  all 
possible  parameter  selection.  In  order  to  overcome  the 
difficulty  of  considering  an  infinite  number  of  possible 
actions,  we  assume  that  the  parameters  of  each  model  can 
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take  values  from  a  reasonably  large  but  finite  domain. 
Suppose  that  there  are  Nhs  possible  choices  of  parameters 
of  model  h.  The  action  probabilities  for  mode  h  are  given 
by  Ps\h  s  =  l...Nhs. 

Suppose  that  model  h  and  model  parameters  5'  have  been 
selected  at  a  given  step.  The  change  of  action  probabilities 
is  given  by  the  following  expressions,  similar  to  (6a)  (6b) 
in  (Zhang  2009b): 


=pl  +  ,en{\-pi) 

(13) 

=Ph2-^^nPl',  h'^h 

(14) 

pn+l  -  n 

Ps\h  Ps\h  ^  n+1 

Ph 

(15) 

s'^s 

Ps'\h  -Ps'\h  n+\  ’  ^ 

Ph 

(16) 

Unlike  in  the  common  operant  conditioning  scenario, 
the  reinforcement  signal  now  does  not  originate  from  the 
environment.  Rather  it  is  computed  according  to  the  DL 
principles,  based  on  how  similar  the  selected  model  with 
selected  parameters  is  to  the  observed  input.  It  is  therefore 
a  function  of  g  (x^,  M^).  The  exact  functional  form  can  be 
specified  in  many  different  ways;  in  this  work  we  define  the 
reinforcement  as  follows: 

6n  =  {A  +  mw^{-A,\ogg{xn\Mh))f  (17) 

In  Eq.  (17),  constant  A  is  the  cutoff  value  for  the  logarithm 
of  the  likelihood  function  that  is  selected  empirically.  The 
transformation  in  (17)  ensures  that  the  reward  is  greater  than 
zero,  which  is  necessary  for  applying  operant  reinforcement 
learning.  The  power  function  is  applied  to  increase  the 
efficiency  of  the  algorithm  by  making  the  difference  between 
high  and  low  likelihood  inputs  larger.  The  reward  structure 
suggested  here  has  a  simpler  functional  form  comparing  to 
our  previous  publication  (Ilin  et  al.  201 1). 

Equations  (13  17)  define  the  OL  version  of  the  DL 
framework  (OL  DL),  which  is  a  two-stage  process.  The 
first  stage  results  in  the  adjustment  of  the  a  priori  proba¬ 
bilities  of  the  models.  The  second  stage  adjusts  parameter 
probabilities  of  the  selected  model.  The  reward  is  intrinsic 
and  is  a  function  of  the  similarity  between  the  model  and 
the  input,  in  accordance  with  the  knowledge  instinct  prin¬ 
ciple  (Perlovsky  2001).  The  repeated  application  of  the 
algorithm  leads  to  learning  the  correct  model  and  its 
parameters  resulting  in  increased  average  reward,  again  in 
accordance  with  the  DL  principles.  The  new  formulation  is 
simpler  and  computationally  more  efficient  than  the 
sequential  algorithm  in  (8  12).  Note  that  during  each  input 
presentation  only  one  similarity  function  is  computed  for 
the  selected  model,  as  opposed  to  the  need  to  compute  all 
similarities  in  (8). 


Computational  efficiency 

Unlike  the  sequential  version  of  the  DL  algorithm,  the  OL- 
DL  algorithm  does  not  involve  the  derivatives  of  the 
models.  Therefore,  it  is  arguably  more  biologically  plau¬ 
sible.  It  can  also  be  extended  to  more  than  two  hierarchical 
levels  in  order  to  handle  more  complex  models.  These  are 
the  main  motivations  behind  deriving  the  algorithm.  In 
terms  of  computational  efficiency,  the  number  of  opera¬ 
tions  in  the  DL  algorithm  is  O  (ENM),  where  N  is  the  size 
of  the  data,  E  is  the  number  of  training  epoch,  which  is  the 
number  of  times  each  data  point  is  presented  to  the  algo¬ 
rithm,  and  M  is  the  number  of  models.  In  the  case  of  OL 
DL,  the  number  of  operations  is  O  {EN).  The  constant 
hidden  inside  the  big-0  notation  may  be  larger  for  OL  DL 
as  the  speed  of  learning  depends  on  the  learning  constant  & 
and  consequently  OL  DL  may  require  more  epochs  to 
converge.  The  results  obtained  for  a  simple  problem  given 
in  the  next  section  show  that  the  difference  in  the  number 
of  epochs  is  not  significant. 


Demonstration  of  results  with  the  OL-DL  algorithm 

Description  of  the  data  set  and  applied  models 

Let  us  consider  the  following  example  to  illustrate  the 
operation  of  the  algorithms.  The  data  consist  of  N  =  800 
two-dimensional  points  originated  from  a  stochastic 
source  with  unknown  probability  distribution.  We 
experiment  with  three  kinds  of  distributions:  (a)  uniform, 
(b)  Gaussian  with  full  covariance  matrix,  and  (c)  Gauss¬ 
ian  with  diagonal  covariance  matrix.  Examples  of  the 
three  kinds  of  data  sets  are  shown  in  Lig.  1.  We  assumed 
that  the  mean  value  of  all  three  data  sources  is  0  and  is 
known  to  the  algorithms. 

The  data  points  (“sensory  inputs”)  are  presented  one 
by  one,  and  the  task  is  to  learn  the  most  appropriate 
model  for  these  data.  Since  we  are  dealing  with  synthetic 
data,  the  repertoire  of  possible  models  includes:  (1)  uni¬ 
form  probability  density,  (2)  Gaussian  probability  density 
with  full  covariance  matrix,  and  (3)  Gaussian  probability 
density  with  diagonal  covariance  matrix.  The  uniform 
probability  density  is  the  same  for  all  data  points  and 
equals  the  inverse  of  the  area  covered  by  the  data.  The 
area  is  updated  as  new  data  points  are  received  by  com¬ 
puting  the  minimum  and  the  maximum  coordinate  in  each 
dimension.  Data  are  presented  to  the  algorithm  in  random 
order  that  changes  after  each  cycle  (epoch)  through  all 
N  points.  In  the  case  of  sequential  DL  algorithm,  the 
learning  rates  were  set  to  the  following  values:  &  =  0.001, 
e,  =  0.01,  P  =  0.9,  Pr  =  0.5. 
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Case  A,  Uniform  Data 


Case  B,  Gaussian  Data 


Case  C,  Gaussian  Dla^onaJ  Data 


Fig.  1  Data  samples  from  the  three  cases  of  two  dimensional  probability  density  functions  considered  in  this  paper 


Results 

The  sequential  DL  and  the  OL  DL  algorithms  described 
above  have  been  implemented  in  Matlab  with  the  three 
models.  The  execution  of  both  algorithms  can  be  illustrated  by 
considering  the  evolution  of  the  a  priori  model  probabilities 
Ph.  Since  there  are  three  possible  models,  we  use  the  proba¬ 
bility  simplex,  which  is  a  triangle  corresponding  to  the  set  of 
non-negative  points  adding  to  one.  The  computation  starts 
with  a  point  close  to  the  center  of  the  triangle.  In  the  process  of 
applying  either  algorithm,  the  probability  of  the  model  that 
corresponds  to  the  data  source  increases  to  the  value  close  to  1 , 
while  the  probabilities  of  the  other  two  models  decrease  to 
values  close  to  0.  Figure  2  illustrates  the  performance  of  OL 
DL  algorithm  with  all  three  types  of  data  source.  We  ran  50 
experiments  for  each  of  the  cases  and  as  the  figure  shows  the 
correct  models  are  selected  in  each  case.  Similar  results  were 
produced  for  the  sequential  DL  algorithm. 

In  order  to  apply  the  OL  version  of  the  DL  algorithm, 
we  made  the  model  parameters  discrete,  as  follows.  The 
covariance  matrix  has  three  independent  parameters.  Each 
parameter  is  quantized  over  a  certain  range  and  a  covari¬ 
ance  matrix  for  each  parameter  combination  is  generated 
and  stored  in  the  computer  memory.  We  experimented  with 
different  sets  of  possible  covariance  matrices.  The  results 
reported  here  employ  a  set  of  36  matrices.  Some  of  the 
possible  choices  for  the  covariance  matrix  are  shown  in 
Fig.  3,  for  both  diagonal  and  full  covariance  matrix.  The 
diagonal  covariance  model  has  also  been  made  discrete 
with  28  possible  choices.  We  also  experimented  with  larger 
number  of  choices,  up  to  300,  producing  similar  results. 
Although  the  number  of  possible  action  choices  may  be  an 
important  parameter,  its  influence  will  be  studied  in  the 
future  work.  If  the  number  of  choices  should  become  too 
large,  the  hierarchical  schemes  for  model  design  will  be 
implemented. 


The  parameters  of  the  OL  algorithm  are  set  as  follows: 
8  =  0.00001,  A  =  10.  The  initial  action  probabilities  are 
assumed  to  be  equal,  and  are  set  to:  ph  =  1/3,  p^\h  =  l/N^s- 
Simulation  showed  that  the  probability  of  correct  model 
increases  to  1  and  the  probabilities  of  the  other  two  models 
decrease  to  0.  Similarly,  the  probability  of  the  correct 
model  parameters  increases  to  1.  This  process,  along  with 
the  reward  as  a  function  of  iteration  is  shown  in  Fig.  4.  The 
reinforcement  signal  jumps  up  and  down  as  the  function  of 
the  selected  models  and  depending  on  the  model  parame¬ 
ters  selected  in  each  iteration.  In  order  to  see  the  trend  we 
smoothed  the  time  series  6^  with  moving  average  filter.  The 
reward  increases  steadily  over  the  execution  of  the 
algorithm. 

We  measured  the  number  of  epochs  necessary  for  the 
algorithms  to  converge.  The  convergence  criterion  was  met 
when  one  of  the  model  probabilities  ph  exceeded  0.99.  The 
results  for  sequential  DL  and  OL  DL  are  shown  in  the 
table  below. 

Table  1  shows  that  the  number  of  times  the  full  data  set 
had  to  be  presented  to  the  algorithm  was  not  significantly 
different  for  both  algorithms.  However,  the  OL  DL  algo¬ 
rithm  performs  fewer  computations  because  it  only  updates 
one  of  the  models  in  each  iteration. 


Conclusions  and  discussion 

This  contribution  explored  connection  between  two  com¬ 
putation  frameworks  for  model  selection:  DL  and  operant 
conditioning.  When  applied  to  a  problem  involving 
simultaneous  model  selection  and  parameter  estimation, 
the  DL  framework  is  conceptually  similar  to  a  two- stage 
OL  sequence  with  intrinsically  generated  reinforcement 
signal.  In  its  existing  formulation,  DL  is  a  Bayesian 
learning  framework.  In  this  work  we  reformulated  dynamic 
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Fig.  2  Evolution  of  a  priori  probabilities  ph  in  case  of  sequential  DL  covariance.  The  triangle  represents  the  probability  simplex,  where 

algorithm.  Model  1  Corresponds  to  the  uniform,  Model  2  Gauss  all  points  satisfy  the  relation  Pi  +  P2  +  P?,  1 

ian  with  full  covariance.  Model  3  Gaussian  with  diagonal 


Examples  af  Gaussian  Models  with  Full  Covariance  Matrix 
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Examples  of  Gaussian  Models  with  Diagonal  Covariance  Matrix 


Fig.  3  Illustration  of  several  possible  choices  for  the  covariance  matrix  for  the  full  and  diagonal  covariance  model.  Sample  data  are  displayed  in 
the  background.  The  ellipses  correspond  to  the  2  standard  deviation  ellipses  of  the  respective  two  dimensional  Gaussian  distributions 
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Fig.  4  Illustration  of  the  OL  dynamic  logic  algorithm  (OL  DL).  Top  model  probabilities  ph.  Middle  Gaussian  model  parameter  probabilities  Ps\h 
for  h  corresponding  to  the  true  model.  Bottom  Moving  average  of  the  reward  signal.  The  average  value  has  been  determined  over  50  iterations 


Table  1  Convergence  of  the  algorithms 


Algorithm 

Number  of  epochs 

to  convergence 

(mean  ±  standard 

deviation) 

Sequential  DL 

9.55  ±  2.  15 

OL  DL 

9.15  ±  3.75 

logic  in  terms  of  OL,  at  the  same  time  preserving  its  main 
principles. 

The  principles  investigated  include  (1)  intrinsic  reward, 
stemming  from  the  postulated  existence  of  the  knowledge 
instinct,  and  (2)  vague-to-crisp  learning  process  for  concept 
formation.  The  intrinsic  reward  increases  with  increased 
similarity  between  the  input  data  and  the  selected  model. 
The  initial  model  selection  and  parameter  selection  prob¬ 
abilities  are  set  to  be  equal,  making  all  model  choices 
equally  possible.  They  are  “vague  concepts”  in  DL  terms. 
The  final  result  is  the  single  model  and  single  parameter 
value  selected  with  probability  one,  corresponding  to  crisp 
concepts.  We  illustrated  the  operation  of  the  new  OL  DL 
algorithm  through  an  example  of  model  selection  and 
parameter  optimization.  We  demonstrate  that  the 


introduced  combined  algorithm  provides  an  alternative 
formulation  of  the  vague-to-crisp  DL  learning  process, 
with  the  added  advantage  of  reducing  computational 
complexity  and  realizing  online  processing. 

Neural  mechanisms  of  DL  and  corresponding  operant 
conditioning  behaviour  are  of  significant  interest  to 
researchers  in  the  neuroscience  of  decision  making.  Bio¬ 
logical  mechanisms  of  learning,  including  OL,  have  been 
studied  in  the  context  of  perceptual  processing,  see,  e.g. 
(Stemme  et  al.  2011;  Neiman  and  Loewenstein  2013).  The 
idea  of  vague-to-crisp  transitions  found  support  in  previous 
neuroimaging  studies  in  the  field  of  visual  processing  (Bar 
et  al.  2006).  In  addition  to  early  visual  cortex,  these  authors 
identified  involvement  of  fusiform  gyrus  and  the  prefrontal 
cortex  (PFC)  in  operation  of  this  mechanism.  They  have 
hypothesized  that  vague  representations  propagate  fast  via 
the  magnocellular  dorsal  pathway  (bottom-up  signal)  from 
early  visual  cortex  to  the  PFC,  in  addition  to  more  sys¬ 
tematic  and  slower  propagation  along  the  ventral  visual 
pathway.  Bottom-up  and  top-down  signals  are  integrated  in 
object-processing  regions  of  the  occipital-temporal  cortex 
(fusiform  gyrus).  Kveragra  et  al.  (2011)  in  their  study  of 
processing  of  contextual  information  also  identified  vague- 
to-crisp  mechanisms,  involving  the  parahippocampal,  ret- 
rosplenial,  and  medial  prefrontal  cortices. 
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A  variety  of  representations  for  internal  models  have 
been  considered  (Rao  et  al.  2002),  including  inverse 
dynamics  and  forward  models.  OL  DL  mechanisms  con¬ 
sidered  in  this  paper  support  these  types  of  models, 
including  parametric,  analytic,  and  probabilistic.  OL  DL 
models  mechanisms  of  bottom-up  and  top-down  signal 
interaction;  in  the  top-down  stream  representations  serve  as 
forward  models,  in  the  bottom-up  stream  representations 
serve  as  inverse  models.  A  salient  innovative  property  of 
OL  DL  is  that  it  supports  contemporaneous  learning  of 
multiple  models  and  assignment/association  of  data  with 
models  while  avoiding  combinatorial  complexity  that  used 
to  be  a  typical  difficulty  of  these  type  learning  problems 
(Perlovsky  et  al.  2011). 

Levine  and  Perlovsky  (2008)  discussed  functions  of  the 
orbitofrontal  cortex  (OFC),  anterior  cingulate  cortex 
(ACC),  and  the  dorsolateral  prefrontal  cortex  (DLPFC)  in 
terms  of  the  postulated  “knowledge  instinct”.  The  argu¬ 
ments  there  appear  to  have  drawn  supported  from  recent 
results  on  involvement  of  dopamine  and  opioid  neurons 
(Litman  2005;  Fiorillo  2011).  At  least  two  separate 
motivational  systems  are  involved:  “wanting”  associated 
with  mesolimbic  dopamine  activation  and  “liking”  asso¬ 
ciated  with  nucleus  accumbens  opioid  activation  (Berridge 
and  Robinson  1998,  2003;  Berridge  2004;  Tindell  et  al. 
2005;  Zhang  et  al.  2009).  It  is  likely  that  OL  DL  as  well 
as  mechanisms  of  the  knowledge  instinct  involve  both 
cognitive  and  motivational  neural  substrates.  The  above 
discussion  is  only  a  step  toward  understanding  these 
mechanisms,  and  opens  up  a  wide  area  for  future  research. 

Models  based  on  attractor  dynamics  have  been  used 
successfully  for  describing  cognitive  processing  (Kay 
et  al.  1995;  Satel  et  al.  2009;  Li  and  Nara  2008).  It  is 
anticipated  that  our  proposed  DL  approach  is  of  relevance 
to  neurodynamics  in  a  broader  context,  exemplified  by  the 
pioneering  experimental  and  theoretical  brain  research  by 
Freeman  (1999).  According  to  Freeman’s  dynamical  sys¬ 
tem  models  of  neural  dynamics,  cognition  is  described 
through  a  trajectory  moving  across  a  convoluted  attractor 
landscape  with  valleys  corresponding  to  memory  patterns 
(Freeman  1975;  Skarda  and  Freeman  1987;  Kozma  et  al. 
2003).  In  the  basal  mode  of  the  intentional  neurodynamic 
process,  the  brain  is  in  a  high-dimensional  dynamic  state, 
and  the  trajectory  of  the  system  explores  the  dynamic 
attractor  landscape.  This  can  be  described  as  a  gaseous 
chaotic  state  (Kozma  et  al.  2012),  which  may  be  inter¬ 
preted  as  a  vague  perceptual  state  according  to  DL.  When 
an  input  pattern  is  presented  to  the  system,  the  oscillations 
undergo  a  phase  transition  and  the  trajectory  is  switched 
to  a  localized  memory  wing,  which  is  described  as  con¬ 
densation  to  a  liquid-like  cognitive  state.  This  phase 
transition  corresponds  to  the  act  of  identification  and 


decision  making,  and  can  be  associated  with  the  formation 
of  a  crisp  state  in  terms  of  DL.  The  new  state  is  main¬ 
tained  for  some  time,  until  conditions  for  a  new  quick 
switch  are  ready,  when  the  whole  cognitive  cycle  starts 
again.  In  this  context,  the  vague-to-crisp  transition  can  be 
viewed  as  a  manifestation  of  the  perceptual  transition 
when  the  input  data  are  perceived  and  identified  in  brains. 
Freeman’s  attractor  dynamics  approach  suggests  a  con¬ 
sistent  framework  for  perceptual  learning  process,  but 
does  not  specify  the  model  selection  details.  In  the  present 
work,  the  operant  conditioning  process  suggests  a  method 
for  implementation  of  above  perceptual  mechanism. 
Future  work  aims  at  comprehensive  evaluation  of  the 
advantages  of  OL  DL  algorithm  with  respect  to  alterna¬ 
tive  learning  approaches.  Our  results  advanced  the  OL  and 
DL  framework  concerning  both  the  computational  effi¬ 
ciency  and  the  biological  plausibility,  especially  in  the 
framework  of  the  intentional  neurodynamic  and  cognitive 
computing  paradigms. 
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Appendix  1:  Dynamic  logic  iteration 


Let  us  consider  the  logarithm  of  the  total  similarity  (1) 

N  H 

\ogL{x,M{S))  =  g{XnMh)  (18) 

n=\  h=l 

The  derivative  of  (18)  with  respect  to  the  parameter 
of  model  can  be  written  as  follows. 


Kh=l 


=  E 


g{Xn,Mh)  d\0gg{Xn,Mh) 


Introduce  the  following  quantity 
^  g{x„,Mh) 

Jhn  — 


T!h'=\S{Xn,Mh') 

The  gradient  is  now  expressed  as  follows 


01ogL 

aA, 


—  V  ^^fhn 


n=l 


dlog  g{Xn,Mh) 

0V 


(19) 


(20) 


(21) 


Instead  of  performing  gradient  ascent  using  (21)  we  can 
iteratively  compute  the  quantities  in  (20)  with  fixed  values  of 
model  parameters,  and  then  perform  a  gradient  ascent  step 
with  fixed  values  of  the  quantities  f^n-  These  quantities  are 
referred  to  as  “association  weights”  in  the  main  text.  The 
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resulting  procedure  is  what  is  given  by  (3  5).  The  proof  of 
convergence  of  this  procedure  is  given  in  (Perlovsky  2001), 
and  is  not  repeated  here  due  to  space  constraints.  Note  that//,^ 
is  the  posterior  probability  of  the  model  h  given  the  observed 
value  Xn.  The  algorithm  can  be  alternatively  derived  using  the 
principal  of  expectation  maximization,  although  this  is  not 
how  it  was  originally  derived. 


This  is  the  expected  value  of  the  difference  between 
reward  received  in  the  current  and  in  the  previous  iteration. 
The  reward  is  a  function  of  the  selected  model  and  model 
parameters  6^  =  9^\xn).  The  expectation  can  be  written  by 
summing  and  integrating  over  the  probability  distributions 
associated  with  all  the  random  quantities:  the  input  data 
and  the  selected  model  and  its  parameters: 


Appendix  2:  Sketch  of  proof  of  OL-DL  convergence 

The  algorithm  given  by  (13  16)  consists  of  a  two-step 
action  selection,  illustrated  below. 


E[en+l  On] 

H  M 


f  {Ph^^P"\h'^n+l  PhPAh^l  M]p  (Xn)  dXn. 

y  h  \  s  I 

iXn 

(23) 


This  expression  can  be  rewritten  by  (1)  bringing  the 
integral  inside  the  summations  since  it  is  only  the  reward 
that  depends  on  the  input,  and  (2)  substituting  the 
expressions  (14  16)  for  the  next  action  probabilities.  We 
also  need  to  separate  the  terms  that  correspond  to  the 
selected  model  and  model  parameters,  with  indices  h  and  5', 
and  the  terms  for  the  rest  of  the  action  and  parameter 
selections,  with  indices  h'  and  s' .  We  will  omit  writing  the 
explicit  dependency  of  the  reward  on  input  to  make  the 
expressions  more  clear. 
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(24) 


After  the  model  and  the  parameters  are  selected,  the 
reward  value  6^  is  computed.  We  can  show  executing  the 
algorithm  (13  16)  results  in  increase  of  the  expected 
reward.  Indeed,  consider  the  following  quantity 

£[A0„]  =£[0„-0„  i],  (22) 


The  first  term  in  (24)  corresponds  to  the  selected  model 
and  parameter  actions.  The  second  term  corresponds  to  the 
selected  model  and  its  parameter  actions  that  were  not 
selected.  The  third  term  corresponds  to  the  models  that 
were  not  selected,  and  the  fourth  term  is  the  expected 
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reward  from  the  previous  time  step.  Note  that  the  expected 
value  of  the  reward  does  not  depend  on  the  time  step: 

J  0''„'p{Xn)dXn=  j  ^^\^p{x„+i)dx„+i=  E[0\.  (25) 

Taking  (25)  into  consideration,  opening  the  parenthesis, 
and  disregarding  the  terms  with  we  can  show  that  the 
following  expression  holds: 

E[^en]>0.  (26) 

This  means  that  the  algorithms  (13  16)  results  in  the 
increase  of  the  expected  reward  and  therefore  to  the 
selection  of  the  best  possible  action  sequence. 
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